Search CORE

7 research outputs found

High Performance MP2 for Condensed Phase Simualations

Author: Bethune Iain
Reyes Ruyman
Publication venue
Publication date: 01/02/2013
Field of study

This report describes the results of a PRACE Preparatory Access Type C project to optimise the implementation of Møller-Plesset second order perturbation theory (MP2) in CP2K, to allow it to be used efficiently on the PRACE Research Infrastructure. The work consisted of three stages: firstly serial optimisation of several key computational kernels; secondly, OpenMP implementation of parallel 3D Fourier Transform to support mixedmode MPI/OpenMP use of CP2K; and thirdly - benchmarking the performance gains achieved by new code on HERMIT for a test case representative of proposed production simulations. Consistent speedups of 8% were achieved in the integration kernel routines as a result of the serial optimisation. When using 8 OpenMP threads per MPI process, speedups of up to 10x for the 3D FFT were achieved, and for some combinations of MPI processes and OpenMP threads, overall speedups of 66% for the whole code were measured. As a result of this work, a proposal for full PRACE Project Access has been submitted

Edinburgh Research Archive

Introducing Parallelism to the Ranges TS

Author: Brown Gordon
Di Bella Christopher
Haidl Michael
Remmelg Toomas
Reyes Ruyman
Steuwer Michel
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

The current interface provided by the C++17 parallel algorithms poses some limitations with respect to parallel data access and heterogeneous systems, such as personal computers and server nodes with GPUs, smartphones, and embedded System on a Chip chipsets. In this paper, we present a summary of why we believe the Ranges TS solves these problems, and also improves both programmability and performance on heterogeneous platforms. The complete paper has been submitted to WG21 for consideration, and here we present a summary of the changes proposed alongside new performance results. To the best of our knowledge, this is the first paper presented to WG21 that unifies the Ranges TS with the parallel algorithms introduced in C++17. Although there are various points of intersection, we will focus on the composability of functions, and the benefit that this brings to accelerator devices via kernel fusion

Crossref

Enlighten

Leveraging task-parallelism in message-passing dense matrix factorizations using SMPSs

Author: Badia Sala Rosa Maria
Martín Huertas Alberto Francisco
Quintana Ortí Enrique Salvador
Reyes Ruyman
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

In this paper, we investigate how to exploit task-parallelism during the execution of the Cholesky factorization on clusters of multicore processors with the SMPSs programming model. Our analysis reveals that the major difficulties in adapting the code for this operation in ScaLAPACK to SMPSs lie in algorithmic restrictions and the semantics of the SMPSs programming model, but also that they both can be overcome with a limited programming effort. The experimental results report considerable gains in performance and scalability of the routine parallelized with SMPSs when compared with conventional approaches to execute the original ScaLAPACK implementation in parallel as well as two recent message-passing routines for this operation. In summary, our study opens the door to the possibility of reusing message-passing legacy codes/libraries for linear algebra, by introducing up-to-date techniques like dynamic out-of-order scheduling that significantly upgrade their performance, while avoiding a costly rewrite/reimplementation.This research was supported by Project EU INFRA-2010-1.2.2 \TEXT:Towards EXa op applicaTions". The researcher at BSC-CNS was supported by the HiPEAC-2 Network of Excellence (FP7/ICT 217068), the Spanish Ministry of Education (CICYT TIN2011-23283, TIN2007-60625 and CSD2007- 00050), and the Generalitat de Catalunya (2009-SGR-980). The researcher at CIMNE was partially funded by the UPC postdoctoral grants under the programme \BKC5-Atracció i Fidelització de talent al BKC". The researcher at UJI was supported by project CICYT TIN2008-06570-C04-01 and FEDER. We thank Jesus Labarta, from BSC-CNS, for helpful discussions on SMPSs and his help with the performance analysis of the codes with Paraver. We thank Vladimir Marjanovic, also from BSC-CNS, for his help in the set-up and tuning of the MPI/SMPSs tools on JuRoPa. Finally, we thank Rafael Mayo, from UJI, for his support in the preliminary stages of this work. The authors gratefully acknowledge the computing time granted on the supercomputer JuRoPa at Jülich Supercomputing Centrer.Peer ReviewedPreprin

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Repositori Institucional de la Universitat Jaume I

Harnessing Hardware Acceleration with RISC-V and the EU Processor

Author: Fumero Alfonso Juan
Goli Mehdi
Kotselidis Christos-Efthymios
Koziris Nectarios
Nikas Konstantinos
Pnevmatikatos Dionisios
Reyes Ruyman
Stratikopoulos Athanasios
Publication venue
Publication date: 06/06/2023
Field of study

The University of Manchester - Institutional Repository

Running parallel bytecode interpreters on heterogeneous hardware

Author: Bellows P.
Clarkson James
Guccione Steve
Reyes Ruyman
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

Crossref

The University of Manchester - Institutional Repository

Leveraging task-parallelism in message-passing dense matrix factorizations using SMPSs

Author: Badia Rosa M.
Martín Alberto F.
Quintana-Orti Enrique S.
Reyes Ruyman
Publication venue: 'Elsevier BV'
Publication date: 28/04/2016
Field of study

In this paper, we investigate how to exploit task-parallelism during the execution of the Cholesky factorization on clusters of multicore processors with the SMPSs programming model. Our analysis reveals that the major difficulties in adapting the code for this operation in ScaLAPACK to SMPSs lie in algorithmic restrictions and the semantics of the SMPSs programming model, but also that they both can be overcome with a limited programming effort. The experimental results report considerable gains in performance and scalability of the routine parallelized with SMPSs when compared with conventional approaches to execute the original ScaLAPACK implementation in parallel as well as two recent message-passing routines for this operation. In summary, our study opens the door to the possibility of reusing message-passing legacy codes/libraries for linear algebra, by introducing up-to-date techniques like dynamic out-of-order scheduling that significantly upgrade their performance, while avoiding a costly rewrite/reimplementation. © 2014 Elsevier B.V. All rights reserved.This research was supported by Project EU INFRA-2010-1.2.2 ‘‘TEXT: Towards EXaflop applicaTions’’. The researcher at BSC-CNS was supported by the HiPEAC-2 Network of Excellence (FP7/ICT 217068), the Spanish Ministry of Education (CICYT TIN2011-23283, TIN2007-60625 and CSD2007-00050), and theGeneralitat de Catalunya (2009-SGR-980). The researcher at CIMNE was partially funded by the UPC postdoctoral grants under the programme ‘‘BKC5-Atracció i Fidelització de talent al BKC’’. The researcher at UJI was supported by project CICYT TIN2008-06570-C04-01 and FEDER. We thank Jesus Labarta, from BSC-CNS, for helpful discussions on SMPSs and his help with the performance analysis of the codes with Paraver . We thank Vladimir Marjanovic, also from BSC-CNS, for his help in the set-up and tuning of the MPI/SMPSs tools on JUROPA . Finally, we thank Rafael Mayo, from UJI, for his support in the preliminary stages of this work. The authors gratefully acknowledge the computing time granted on the supercomputer JUROPA at Jülich Supercomputing Centre.Peer Reviewe

Digital.CSIC

Leveraging task-parallelism in message-passing dense matrix factorizations using SMPSs

Author: Badia Sala Rosa Maria
Martín Huertas Alberto Francisco
Quintana Ortí Enrique Salvador
Reyes Ruyman
Publication venue
Publication date
Field of study

RECERCAT